Image Processing in Python

A concrete application in histopathology

Quentin Caudron

What is histopathology ?

Histopathology is the microscopic examination of tissue samples (from biopsies), to study manifestations of disease.

— Wikipedia

Preparation of tissue samples

The tissue is processed by embedding in paraffin wax, microtomy, and staining with dyes.

Traditionally, analysis often relies on scores

Problems : manual scoring doesn't capture everything

Problems : human labelling is slow and error-prone

Computer-aided diagnosis - improving the manual process

Computer-aided diagnosis - automatic cell counting and contour extraction

Computer-aided diagnosis - tissue segmentation

Application - liver in Soay sheep

Liver structure

Measuring levels of fibrosis, traditionally

The problem - quantifying the extent of inflammation

A traditional image processing approach

We're going to pull primarily from scikit-image, a fantastic Python library for image processing.

We'll also use parts of scipy's ndimage module.

Much like in other areas of traditional machine learning, a lot of image processing is about feature engineering, representing the information you have in a way that is more digestable and processable by a computer. Feature engineering is as much art as science.

In [2]:
# Load the data, and look for a hint of where to start first
widefield_image_file = "data/Sheep11-4x-77.jpg"
image = transform.rescale(io.imread(widefield_image_file), 0.25)

plot_image(image)

The first channel is potentially promising, but those veins might cause issues. Instead of thinking about this in terms of RGB, maybe we can break the image up into its dyes : haematoxylin and eosin.

In [3]:
# Perform a colour deconvolution
heu = color.separate_stains(image, colour_deconv_matrix).astype(float)

plot_image(heu)

The middle channel here looks like a good start. Let's boost the contrast and improve our dynamic range using histogram equalisation.

In [4]:
# Apply CLAHE on the haematoxylin channel
rescaled = exposure.rescale_intensity(heu[:, :, 1], out_range=(0, 1))
equalised_hist = exposure.equalize_adapthist(rescaled)

plot_image(np.dstack((image, equalised_hist)))

Haematoxylin dyes nuclei - and zones of inflammation of just high concentrations of cells, each of which contain a nucleus. We can now pretty easily pick out the zones of inflammation. Still, we need to differentiate between the inflammation and the background, which also contains lots of nuclei.

In [5]:
# Apply Gaussian blur
blurred = filters.gaussian(equalised_hist, 7)

plot_image(np.dstack((image, blurred)))

Applying a blur helped to even things out a little. We're at a point where the inflammation is quite bright, and the background is a medium grey. Let's crank up the contrast to stretch our dynamic range a little.

In [6]:
# Sigmoid transform for contrast
contrast = exposure.adjust_sigmoid(blurred, cutoff=0.6)

plot_image(np.dstack((image, contrast)))

This looks much more promising now. We're probably at a point where we can threshold again.

In [7]:
# Apply an adaptive threshold
thresholded = contrast > filters.threshold_local(contrast, 351, offset=-0.1)

plot_image(np.dstack((image, thresholded)))

We now have something that matches fairly well. However, the contours of the inflammatory zones aren't very "natural", and the zones contain holes. We can clean this up using some morphological operations.

In [8]:
# Use a maximum filter followed by a binary closing to fill any holes
enlarged = maximum_filter(thresholded, 5)
inflammation = morphology.closing(enlarged, morphology.disk(11))

plot_image(np.dstack((image, inflammation)), titles=["Original", "Automated image processing"])

Let's compare with our human expert's mask.

In [9]:
# Load up the human mask and visualise everything together
mask = transform.rescale(io.imread("data/Sheep11-4x-77.jpg_mask.jpg"), 0.25)
all_images = np.dstack((image, mask, inflammation))

plot_image(all_images, titles=["Original", "Human-masked", "Automatic"])

Noisy data, and the need for robustness

How does this algorithm perform ?

Extension : a neural network approach

This is a quick experiment with a type of neural network called a convolutional encoder-decoder. They're some of the state-of-the-art neural network architectures for semantic segmentation : breaking down images into different parts, based on their content.

Over the weekend, I trained a ( very untuned ) convolutional encoder-decoder. Let's see what it can do.

In [12]:
# Let's use a convolutional encoder-decoder network to find inflammatory zones
from keras.models import load_model

model = load_model("models/histo_convnet.hdf5")
model.summary()
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_206 (Conv2D)          (None, 432, 648, 32)      896       
_________________________________________________________________
conv2d_207 (Conv2D)          (None, 432, 648, 32)      9248      
_________________________________________________________________
conv2d_208 (Conv2D)          (None, 432, 648, 32)      9248      
_________________________________________________________________
max_pooling2d_37 (MaxPooling (None, 216, 324, 32)      0         
_________________________________________________________________
conv2d_209 (Conv2D)          (None, 216, 324, 64)      18496     
_________________________________________________________________
conv2d_210 (Conv2D)          (None, 216, 324, 64)      36928     
_________________________________________________________________
conv2d_211 (Conv2D)          (None, 216, 324, 64)      36928     
_________________________________________________________________
max_pooling2d_38 (MaxPooling (None, 108, 162, 64)      0         
_________________________________________________________________
conv2d_212 (Conv2D)          (None, 108, 162, 256)     147712    
_________________________________________________________________
conv2d_213 (Conv2D)          (None, 108, 162, 256)     590080    
_________________________________________________________________
max_pooling2d_39 (MaxPooling (None, 54, 81, 256)       0         
_________________________________________________________________
conv2d_214 (Conv2D)          (None, 54, 81, 512)       1180160   
_________________________________________________________________
conv2d_215 (Conv2D)          (None, 54, 81, 512)       2359808   
_________________________________________________________________
up_sampling2d_37 (UpSampling (None, 108, 162, 512)     0         
_________________________________________________________________
conv2d_216 (Conv2D)          (None, 108, 162, 256)     1179904   
_________________________________________________________________
conv2d_217 (Conv2D)          (None, 108, 162, 256)     590080    
_________________________________________________________________
up_sampling2d_38 (UpSampling (None, 216, 324, 256)     0         
_________________________________________________________________
conv2d_218 (Conv2D)          (None, 216, 324, 64)      147520    
_________________________________________________________________
conv2d_219 (Conv2D)          (None, 216, 324, 64)      36928     
_________________________________________________________________
conv2d_220 (Conv2D)          (None, 216, 324, 64)      36928     
_________________________________________________________________
up_sampling2d_39 (UpSampling (None, 432, 648, 64)      0         
_________________________________________________________________
conv2d_221 (Conv2D)          (None, 432, 648, 32)      18464     
_________________________________________________________________
conv2d_222 (Conv2D)          (None, 432, 648, 32)      9248      
_________________________________________________________________
conv2d_223 (Conv2D)          (None, 432, 648, 32)      9248      
_________________________________________________________________
conv2d_224 (Conv2D)          (None, 432, 648, 1)       289       
=================================================================
Total params: 6,418,113
Trainable params: 6,418,113
Non-trainable params: 0
_________________________________________________________________
In [13]:
# Use the convnet to find the inflammatory zones, and compare with our work so far
nn_image = transform.rescale(image, 0.5)
pred = model.predict(nn_image.reshape(1, *nn_image.shape)).squeeze()

plot_grid((image, mask, inflammation, pred), ["Original", "Human-masked", "Classical", "ConvNet"])

Next steps with this neural net

  • Hyperparameter tuning
  • Data augmentation
  • Longer training
  • Thresholding the predictions

Image processing in Python

scikit-image is a fantastic library with excellent documentation and a great API.

Keras is my library of choice for quickly developing neural networks, including the one trained above.

scipy's ndimage module contains a number of very useful image processing methods, such as FFT-based convolutions, binary morphological operations, and measurement tools.